Goto

Collaborating Authors

 counterfactual prediction





PAC: Assisted Value Factorization with Counterfactual Predictions in Multi-Agent Reinforcement Learning

Neural Information Processing Systems

Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods. It allows optimizing a joint action-value function through the maximization of factorized per-agent utilities. In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints (across different states) on the representable function class, causing significant estimation errors during training. We tackle this limitation and propose PAC, a new framework leveraging Assistive information generated from Counterfactual Predictions of optimal joint action selection, which enable explicit assistance to value function factorization through a novel counterfactual loss. A variational inference-based information encoding method is developed to collect and encode the counterfactual predictions from an estimated baseline. To enable decentralized execution, we also derive factorized per-agent policies inspired by a maximum-entropy MARL framework. We evaluate the proposed PAC on multi-agent predator-prey and a set of StarCraft II micromanagement tasks. Empirical results demonstrate improved results of PAC over state-of-the-art value-based and policy-based multi-agent reinforcement learning algorithms on all benchmarks.



We thank all the reviewers for their constructive comments

Neural Information Processing Systems

We thank all the reviewers for their constructive comments. Making predictions directly on a pixel level without the intermediate structures won't be Still, we follow the reviewers' suggestion by including an additional baseline that predicts directly over the pixels. The above figure shows the results. Dreamer's prediction deviates from the ground truth and quickly becomes blurry, Baselines, even with graph-structured prediction models, cannot cope with such out of distribution generalization. Applicability of the proposed method (R4, R1).


Counterfactual Probabilistic Diffusion with Expert Models

Mu, Wenhao, Cao, Zhi, Uludag, Mehmed, Rodríguez, Alexander

arXiv.org Artificial Intelligence

Predicting counterfactual distributions in complex dynamical systems is essential for scientific modeling and decision-making in domains such as public health and medicine. However, existing methods often rely on point estimates or purely data-driven models, which tend to falter under data scarcity. We propose a time series diffusion-based framework that incorporates guidance from imperfect expert models by extracting high-level signals to serve as structured priors for generative modeling. Our method, ODE-Diff, bridges mechanistic and data-driven approaches, enabling more reliable and interpretable causal inference. We evaluate ODE-Diff across semi-synthetic COVID-19 simulations, synthetic pharmacological dynamics, and real-world case studies, demonstrating that it consistently outperforms strong baselines in both point prediction and distributional accuracy.